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Abstract — Compressed sensing seeks to recover a sparse vector from 
a small number of linear and non-adaptive measurements. While most 
work so far focuses on Gaussian or Bernoulli random measurements we 
investigate the use of partial random circulant and Toeplitz matrices in 
connection with recovery by ^i-minization. In contrast to recent work 
in this direction we allow the use of an arbitrary subset of rows of 
a circulant and ToepUtz matrix. Our recovery result predicts that the 
necessary number of measurements to ensure sparse reconstruction by 
^1 -minimization with random partial circulant or ToepUtz matrices scales 
linearly in the sparsity up to a log-factor in the ambient dimension. This 
represents a significant improvement over previous recovery results for 
such matrices. As a main tool for the proofs we use a new version of the 
non-commutative Khintchine inequality. 

I. Introduction 

Compressed sensing is a recent concept in signal processing where 
one seeks to reconstruct efficiently a sparse signal from a minimal 
number of linear and non-adaptive measurements fTil. So far various 
measurement matrices have been investigated, most of them random 
matrices. Among these are Bernoulli and Gaussian matrices |2| (with 
independent ±1 or standard normal entries) as well as partial Fourier 
matrices (3), (4), (5). Recently, Bajwa et al. (6) (see also (7)) studied 
Toeplitz type and circulant matrices in the context of compressed 
sensing where the entries of the vector generating the Toeplitz 
or circulant matrix are chosen at random according to a suitable 
probability distribution. Compared to Bernoulli or Gaussian matrices 
random Toepliz and circulant matrices have the advantage that they 
require a reduced number of random numbers to be generated. More 
importantly, there are fast matrix- vector multiplication routines which 
can be exploited in recovery algorithms. Furthermore, they arise 
naturally in certain applications such as identifying a linear time- 
invariant system |8|. 

Basis Pursuit (^i -minimization) is one of the major approaches 
to efficiently recover a sparse vector. This technique is quite well 
understood by now. Modern optimization algorithms ||9j such as 
LARS 1 10 1 (sometimes called homotopy method) are reasonably fast. 

Bajwa et al. |6|, |8| estimated the so-called restricted isometry 
constants of a random Toeplitz type or circulant matrix which then 
allows to provide recovery guarantees for -minimization. However, 
their bound is very pessimistic compared to related estimates for 
Bernoulli / Gaussian or partial Fourier matrices. More precisely, the 
estimated number of measurements grows with the sparsity squared, 
while one would rather expect a linear scaling. Indeed, this is also 
suggested by numerical experiments. We close the theoretical gap by 
providing recovery guarantees for £i -minimization in connection with 
circulant and Toeplitz type matrices where the necessary number of 
measurements scales linearly with the sparsity. However, we do not 
make use of the restricted isometry constants and a good estimate of 
the latter is therefore still open. 

II. Sparse recovery with circulant and Toeplitz 

MATRICES 

For a vector let supp X = {j, Xj 7^ 0} denote its 

support and ||a;||o = j supp xj the number of non-zero entries. It is 



called s-sparse if ||a;||o < s. We aim at recovering x from y = Ax £ 
M" where yl is a suitable n x N measurement matrix and n < N. 
A natural strategy is to consider i?o -minimization, 

min||3;||o subject to Aa; = y. (1) 

X 

Unfortunately this combinatorial optimization problem is NP hard in 
general 1111 . Therefore, we solve instead the convex problem 

mirillxlli subject to Aa; = y, (2) 

where the ^„-norm is defined as usual, ||a;||p = l^A^Y^"- It 

is by now well understood that the solutions of both minimization 
problems often coincide and are equal to the original vector x, see 
e.g. 1 12], 1 13 1, 1 1 1, 1 14 1, 115 1. A by now popular resuk 1 12|, 1161, [17 1 
states that indeed l|2j (stably) recovers all s-sparse x from y = Ax 
provided the restricted isometry constant S2s < S < \/2 — 1. The 
latter means that 

{i~s)Ml < WMll < {i + 5)\\x\\l 

for all 2s-sparse vectors x. It is known |2| that random Gaussian or 
Bernoulli matrices, i.e. nx N matrices with independent and normal 
distributed or Bernoulli distributed entries, satisfy this condition with 
probability at least 1 — e provided s < Cinlog{N/ s) + C2 log(e~^). 

We consider the following types of measurement matrices. For 
b = (bo, 61, ... , &jv_i) G R'^ we let its associated circulant matrix 
S = S'' G 'R^^^ with entries Sij = bj-i mod jv, where i,j = 
1,...,N. Similarly, for a vector c — {c-m+i,c-m+2, ■ ■ ■ ,cn-i) 
its associated Toeplitz matrix T = T'^ £ R'^^^ has entries Ti,j = 
Cj-i, where i,j — 1,...,N. Now we choose an arbitrary subset 
Q, C {1, . . . , N} of cardinality n < N and let the partial circulant 
matrix Sn = Sq G M"^^ be the submatrix of S consisting of the 
rows indexed by Q,. The partial Toeplitz matrix Tn ~ Tq £ M"^^ 
is defined similarly. In this paper the vectors b and c will always be 
random vectors with independent Bernoulli ±1 entries. 

Of particular interest is the case A*' = nK for some K £ N 
and Q, — {K,2K, . . . ,nK}. Then the application of Sq and 
Tq corresponds to (periodic or non-periodic) convolution with the 
sequence b (or c, respectively) followed by a downsampling by a 
factor of K. This setting was studied numerically in |18| by Tropp 
et al. (using orthogonal matching pursuit instead of ^1 -minimization). 
Also of interest is the case Q = {1, 2, . . . , n} which was investigated 
in (6), (8) by Bajwa et al., who showed that the restricted isometry 
constant of Tq satisfies Ss < S with high probability (w.h.p.) provided 
n > CfS^ log(A*'/s). As a byproduct of the proof of our main result 
we give an alternative proof that Ss < 5 holds w.h.p. under the 
condition n > C5~'^s'^ log^{N). However, we strongly believe that 
this bound is not optimal due to the quite pessimistic quadratic scaling 
in s. Our main result shows that one can achieve recovery w.h.p. by 
^i-minimization, if n > Cslog^(7V). 

In the following recovery theorem we use a random partial cir- 
culant or Toeplitz matrix Aq or Tq in the sense that the entries of 
the vector 6 or c are independent Bernoulli ±1 random variables. 
Furthermore, the signs of the non-zero entries of the s-sparse vector 



2 



X are chosen at random according to a Bernoulli distribution as well. 
In contrast to previous work f6l, f 181 ^ is allowed to be an arbitrary 
subset of {1, . . . , A'^} of cardinality n. 

Theorem 2.1: Let Q, C {1, 2, . . . , TV} be an arbitrary (determinis- 
tic) set of cardinality n. Let x G be s-sparse such that the signs 
of its non-zero entries are Bernoulli ±1 random variables. Choose 
b G to be a random vector whose entries are ±1 Bernoulli 
variables. Let y = Sqx G M". There exists a constant C > such 
that 

n > Cs log^(7V/e) 

implies that with probability at least 1 — e the solution of the £i- 
minimization problem ^ coincides with x. 

The same statement holds with in place of where c G 
■|^2iv-i ^ random vector with Bernoulli ±1 entries. 
Ignoring the log-factor the necessary number of samples ensuring 
recovery by £i -minimization scales linearly with the sparsity s. 
The power 3 at the log-term can very likely be improved to 1, 
and moreover, it seems also possible to remove the randomness 
assumption on the non-zero coefficients of x. We postpone such 
improvements as well as an investigation of the restricted isometry 
constants to possible future contributions. The remainder of the paper 
is concerned with the proof of Theorem 12.11 

III. Proof of Theorem |2. 1| 

An essential ingredient of the proof is the following recovery 
theorem for £i -minimization due to Fuchs 1191 and Tropp |20|. For 
a matrix A we denote by ap its columns and by Aa the submatrix 
consisting only of the columns index by A. 

Theorem 3.1: Suppose that y — Ax for some x with suppx — A. 

If 

\{A\ap, sgn(2;A))| < 1 for all p ^ A , (3) 

then X is the unique solution of the Basis Pursuit problem l[2}. Here, 
A\ denotes the Moore-Penrose pseudo-inverse of A a. 

A crucial step in applying this theorem is to show that the ^2-norm 
of A\ap in l[3j is small. To this end one expands 

P^aplla = UAIAa)''' AlupWi = ||(4A^A)~'||2-.2||AAap||2, 

(4) 

where || • ||2^2 denotes the operator norm on £2. The second term 
can be estimated in terms of the coherence of A, which is defined 
to be the largest absolute inner product of different columns of A, 
jj, = maxp^^A I {o-p, ia) I ■ Indeed, 

1/2 

\A*Aax\\2= \y^\{ax,ap)\^\ < \/\A\^-i. 



= I ^ \{ax,ap)\ 
Vasa 



The coherence of a random Toeplitz or circulant matrix can be 
bounded as follows. 

Proposition 3.2: Let fi be the coherence of the random partial 



circulant matrix -i=Sc> G 



or Toeplitz matrix -^Tq G 
where b and c are Rademacher series and has cardinality n. Then 
with probability at least 1 — e the coherence satisfies 



/i < 4 



log(27VVe) 



The proof is contained in Section |V] This proposition easily implies 
the following (probably non-optimal) estimate of the restricted isom- 
etry constants of 5*0 or Tq contained also in |8| with a different 
proof. 



Corollary 3.3: Let -l=S^n, TT^Tfi G 



be the randomly 



from Rademacher series and 5s be their restricted isometry constant. 
Assume that 

n > 165"^s^log2(27VVe)- 

Then with probability at least 1 — e it holds 5s < 5. 
Proof: Combine the bound Ss < (s — (which easily follows from 
Gershgorin's disk theorem) with the estimate above on the coherence 
ofA= -^Sh oi A= -^T^. m 

As suggested by (|4]l we also need an estimate of the operator norm 
of the inverse of A*aAa. To this end we bound the smallest and largest 
eigenvalue of this matrix. 

Theorem 3.4: Let fi, A C {1, . . . , A''} with \Vt\ = n and |A| = s. 
Let 6 G and c G R'^^"^ be Rademacher series. Denote either 
A = -^Sci ov A= -^T^. Assume 

n>C5~^slog^{4s/e), (5) 

where C = 47r^ ~ 39.48. Then with probability at least 1 — e the 
minimal and maximal eigenvalues Amin and Anmx of A\Aa satisfy 

1 5 ^ Amin < A max < 1 + 5. 

Note that the above theorem holds for a fixed subset A and random 
coefficients b or c. It does not imply that for given b or c the estimate 
holds uniformly for all subsets A, which would be equivalent to 
having an estimate for the restricted isometry constants of -^Sq 
or -^Tq. (Note that taking a union bound over all subsets A would 
yield an estimate essentially worse than Corollary 13.31 ) 

Now we are ready to complete the proof of Theorem 12.11 on the 
basis of Proposition 13.21 and Theorem 13.41 We proceed similarly as 
in |21, Theorem 14]. Hoeffding's inequality states that 



V2 



(6) 



By our assumption on the random phases ex = sgn{xx), the scalar 
product on the left hand side of ([3} is precisely of the above form 
with a = A^ap — {A\Aa)~^ A^Up. Theorem 13.41 implies that 
the smallest eigenvalue of AaAa is bounded from below by 1 — 5 
with probability at least 1 — e provided condition l|5) holds; hence, 
||(4a^a)"^||2^2 < YTi- Plugging this into @ yields 



A«p||2 



I Ala, 



1 - 5 



(7) 



Following Theorem 13.11 the probability that recovery fails can be 
estimated by 

P(| (A]^ap, _Ra sgn(a;)) I > 1 for some p ^ A) 
< P( \{A\ap,RASg-a{x))\ > 1 for some p ^ A |/i < —= 
&An,in > 1 - 5) +P(m > +P(A„,i„ <l-5) 



< Y^¥{\{A\ap,RASgn{x))\ > 1 j ti<—&\min >l-5) 



+ P M> 



= ) +P(A„,in < 1-5). 



Under the assumption p < equation l|7j implies that for 



ii^^^ we have it||Aj^ap||2 < 1, so ^ gives 



\{A\ap, R A sgn{x))\ > 1 \ fi < &Amin > 1 - 5) 

V n 



generated normalized partial circulant and Toeplitz matrix generated 



< 2 exp - 



2a2 



(8) 
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Setting a = 41og(2iV^/e) Theorem IT2l yields 

Now we choose 5 — 1/2. Under condition (|5), which reads 

n > 4Cs\og^{s/e), 



(9) 



we have P(Amin > 1 — <5) < e- Hence, under the above conditions 
we obtain 



(A^Up, Ra sgn(a::))| > 1 for some p ^ A) 



< 2N exp 



1 



+ 2e. 



(10) 



81og2(2Af2/e) 

The first term is less than e provided n > 

8slog2(2iVVe) log(2Ar/e), or 



n > Cislog'^(Ar/e) 



(11) 



for a suitable constant Ci. Conditions l|9) and l lllt are both satisfied 
if 

n > Cslog^iN/e) 

for a suitable constant C, in which case the probability that recovery 
by ^1 -minimization is less than 3e. This completes the proof. 

IV. Non-commutative Khintchine inequalities 

Both the proof of Proposition 13.21 as well as the proof of Theorem 
13.41 are based on versions of the Khintchine inequality. Let us first 
state the non-commutative Khintchine inequality due to Lust-Piquard 
1221 and Buchholz 1231 . see also 1211 . To this end we introduce 
Schatten class norms on matrices. Denoting by a-{A) the vector of 
singular values of a matrix A, the Sp-norm is defined as 



\\A\\ 



\WiA)\\p, 



where || ■ ||p is the usual ^p-norm, 1 < p < oo. 

Theorem 4.1: Let (Ak) be a finite sequence of matrices of the 
same dimension and let (gfc) be a sequence of independent standard 
Gaussian random variables. Then for m £ N, 



E 



'^QkAk 



l/2r7 



< Bm max ' 



1/2 



S2n 



Y^AtA, 



1/2 



with optimal constant 



Bm = 



/(2rn)! 



Using the contraction principle for Bernoulli random variables, see 
l24i eq. (4.8)], we obtain the non-commutative Khintchine inequality 
for Bernoulli random variables 1221 . 

Corollary 4.2: Let (Ak) be a finite sequence of matrices of the 
same dimension and let (e^) be a sequence of independent Bernoulli 
±1 random variables. Then for rn G N, 

l/2m 







2m 


E 








k 


S2m. 



< Cm max ■ 



S2m 



(12) 



with constant 



IF /(2m)!\2" 
^"^ " V 2 I 2-m! , 



In the scalar case the factor \/tv/2 can be removed. However, it is not 
clear yet whether this is true also in the non-commutative situation. 

The following theorem extends the non-commutative Khintchine 
inequality to a second order chaos variable. Its proof uses decoupling 
and Corollary 14.21 

Theorem 4.3: Let Aj^t G C^^ , j,k — l,...,N,he matrices 
with Ajj — 0, j — 1, . . . , N . Let et, k = 1, . . . , N be independent 
Bernoulli random variables. Then for m G N it holds 

2m -I l/2rn 



E 



S2n 



< Dm max ' 



\j,k = l , 



\j,k=l 



1/2 



1/2 



S2r, 



S2r, 



where F is the block matrix F — {Aj^k)f^k=i ™'l constant 



^l/2m 



2tvC„ 



V2'"m! / 



At present it is not clear whether the term ||-F||s2,„ can be omitted 
above. At least, there is no a priori inequality between any of 
the terms in the maximum. The proof of the theorem is based on 
the following decoupling lemma, see 1251 Proposition 1.9] or 1261 
Theorem 3.1.1]. 

Lemma 4.4: Let ^j , j = 1, . . . , A'', be a sequence of independent 



random variables with E^j = for all j = 1, 



,N. Let A 



-j,k. 



. ,N, be a double sequence of elements in a Banach 



space with norm 
for 1 < p < oo 



, where Ajj — for all j = 1, 



, N. Then 



E 



j.k^i 



< A"! 



j.k^i 



where 5' denotes an independent copy of the sequence ^ — {^j). 

Proof of Theorem I4.3I We apply Lemma l44l followed by the 
non-commutative Khintchine inequality il2\ . 

2m 



j,fe=i 



< 42™E,E,, 



S2m. 

N 



j,k = l 



S2r, 



< 42™C^'"E, 1 



1/2 



YBkieYBkie) 



S2n 



J2Bk{e)Bkiey 



1/2 



(13) 



S2„ 



where -Bfe(e) := X]j=i ^j^j.fe- We define 

Aj,k = (0|...|0|^j,fc|0|...|0) G 
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where the non-zero block Aj,k is the fc-th one, and similarly 

I,,fc = (o|...|o|A*fe|o|...|o)*eC'''^^'. 



Hence, 



Then clearly 



I j.k=i 



/2 II 2m 
S2^ 



\\F\\ 



iffe/fe', 
1*, 



iffe = fe' ' 



(14) As differs from Aj^^ only by interchanging ylj^fe with A* k we 



if fc / k', 

The Schatten class norm satisfies \\A\\s2^ = WiAA'Y^^Ws^^- This 
allows us to verify that 

/ \ 1/2 



JV JV 

i=i fe=i 



S2„ 



1/2 



1/2 



Similarly, we also verify that 

1/2 



S2„ 



JV JV 

j=i fc=i 



Plugging the above expressions into l |13l l we can further estimate 



E < 4^^(71™ E 



+E 



E^jE^J''^ 

j=l fc=l 

JV JV 

E^^E^^> 

j=i fc=i 



2m 



Using Khintchine's inequality l ll2t once more we obtain 



El —E 



N N 

E^^E^j.^ 

i=i fe=i 



< Ctt, max ■ 



1/2 



1/2 



2m 



S2„ 



where _Bj = X^feLi Using l ll4t we see that 

E^j^^' ^E^^.^^J.fc- 

Furthermore, with the block matrix 

B2 



F ■ 



A-i,! A2,2 
\ An,1 An,2 



^2,JV 



^JV,JV / 



we have 



2m 
S2m 



obtain similarly 

£2 -E 

< max ■ 



Finally, we obtain 

E < 42'"C^'"(£i + £2) 



JV JV 


2m 


E^^E^^.^ 




j=i fc=i 




/ JV 




E ^^^^^'^ 







1/2 



\F\ 



S2„ 



S2„ 



^ n <2mx-f4m 

< 2 • 4 6 ™ max ■ 



1/2 



E ^l''A.i,k 



1/2 



\\F\ 



S2r, 



S2n 



This concludes the proof. ■ 
Repeating the above proof for the scalar case (which removes the 
factor 7r/2 in the constant) and applying interpolation (see il6l and 
in\ below) yields the following (compare also |27, Proposition 2.2]). 

Corollary 4.5: Let aj,k G C, j,k = 1, . . . , A*' be numbers with 
ajj = 0, j = 1, . . . ,N. Let tk, k = 1, . . . , TV be independent 
Bernoulli ±1 random variables. Then for 2 < p < 00 it holds 





JV 


p- 


E 


E ^3^kaj,k 











1/2 



where the constant 



i/p/- 



dp = 4^/''(4/e)p 



V. Proof of the coherence estimate 
Now we are equipped to provide the proof of Proposition 13.21 An 

b 



inner product of two columns st , se of the normalized matrix Sq 

- E ''ibka]^, 



has the form 
{si,se) = -y^^b, 



'i — r mod N^i — r mod JV 



rSf2 



j,k=l 



where a 



j,k 



1 if (j, k) = {i — r mod N,l — r mod A^) for 
some r G f2 and = otherwise. Similarly, the inner product 
of the columns ti of the normalized matrix -^T^i can be written 
as {U,tf) = w"^ Ej^tTi-jv+i CiCfcfl};! with a^^ = 1 if (j, fc) = 
{i-rj-r) G {1, . . . ,iV}^ for some r G f2 and otherwise. Observe 
that E ■ fe = E. I. \a.i.k? = M = n. Now let 6 G and 

c G 



be Rademacher series. Then Corollary 14.51 vields 

n(EKs.,.,)n^/-= U\Y.^,bkaYk\'\ 



3,k 



1/2 



<4i/''(4/e)pK]|a,,, 
\ i.fc 
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for p > 2, and the same estimate holds for E| (ti, tj) j*". In order 
to complete the proof we use the following simple and well-known 
probability estimate, see e.g. 1241 . 1211 . 

Lemma 5.1: Suppose Z is a positive random variable satisfying 

(]E^P)i/p < Q,/3i/Ppi/7 foj. all po < P < oo and some a,/3,7 > 0. 
Then for arbitrary k > 0, 

F{Z > e^au) < /3e-""^ 

for all u > po. 

Proof: By Markov's inequality we obtain 



1/7 \P 



{e"-au)p 

Choosing p = v? yields the statement. ■ 
Lemma ISTI with the optimal choice k = 1 yields 

P(n|{si,S£)| > -iy/nu) < 4e"" 

for u > 2. Taking the union bound over all possible pairs of different 
columns Si, si we obtain 

Set the right hand side to e. Then the resulting u — \og{2N'^ / e) > 2 
since we may assume without loss of generality that N > 2. We 
obtain 

,>4i25(^l<e. 



The same holds for the coherence of -^TS. 



VI. Proof of Theorem I3.4I 

We introduce the elementary shift operators on R^, {Sjx)i — 
xi-j mod jv, j = 1, • . • , A^, and 



xe-j if 1 < £- j < N, 
otherwise, 



for j = -iV + 1, . . . , iV - 1, £ = 1, . . . , iV. Further, denote by 
Rn — > the operator that restricts a vector to the indices in 
n. Then we can write 

JV JV-l 

Sn = i?n^ej5, and = Rn ^ e^Tj, 



where (ej) is a Rademacher sequence. Denote by A either -^•S'n or 
^Tq. We need to prove a bound on the operator norm of Xa :— 
A\At<^ — 7a where /a denotes the identity on M/^. We introduce 



Ha 



to be the extension operator that fills up a vector 



in R with zeros outside A. Further, we denote by Dj either Sj or 
Tj. Observe that 

AIAa = i^eji^A-Dj'-Rn^efc-RnDfe-RX 

j k 

= i ^ ejekRAD'PnDkRl + -Ra i J2 D*PnDj J Rl, 

where Pn = RqRq denotes the projection operator which cancels 
all components of a vector outside Q. Here and in the following the 
sums range either over {1, . . . , TV} or over {— + 1, . . . , A'^ — 1} 
depending on whether we consider circulant or Toeplitz matrices. It 
is straightforward to check that 



(15) 



where Im is the identity on M.^ . Since RaRa = we obtain 
Xa ^ — ^2 <^i<^iRAD* PnDkRl = — ^ CjCfe^i.fc 



with Aj^k = RaD* PnDkR*A- Our goal is to apply Corollarv l4.3l To 
this end we first observe that by J15b 

Y^Al^A.^t = RaDIPh (^DjPaD*^ PnDiRl 

= sRADlPnDiRl. 
Using J15b once more this yields 

^ = sRa I ^^DlPfiDk R*A = suRaRX = suIa- 

j,k \ k / 

Since the entries of all matrices Aj,k are non-negative we get 



< Tr Al,A,,kj = Tr (sn/A)'" = s'^+^n™, 

where Tr denotes the trace. Furthermore, since A* j. = Ak,j we have 
J2j^k ^*j,k^3,k = Jlj^k ^i,kA*^k- Let F denote the block matrix 
F = {Aj^k)j,k where Aj^k = Aj^k if j ^ k and Aj^j = 0. Using 
once again that the entries of all matrices are non-negative we obtain 

ll^lllL =Tr[(F*F)'"] 



Tr 



31,32, ■■■,3jr, 



<Tr E 

fci , . . . , fc 



E/ ^31,kl^31,k2 • ■ ■ E/ ■^3m,km^3m,ki 
31 3m 

= s'"Tr E [RADl^PnDk^RlRADl^PnDk^Rl- 

fcl , . . .,km 

■ ■ ■ RADl^PnDk^Rl] , 



where we applied also dlSt once more. Using the cyclicity of the trace 
and applying JlSI l another time, together with the fact that = Tlf. 
and Sk = Sli-k, gives 



\S2r, 



Y,Dk,PADl^PnY, Dk^ PaDI^ Pn ■ ■ 

k2 

s Tr[PnJ = ns . 



■■■^Dk^PADl^Pn 

Since by assumption l|5]l s < n it follows that 

\\F\\lZ.<\\{J2Al,A,,k)'^YsZ^ 

3^k 

1 1 /\ ^ A /I * \ 1 / 2 1 1 2 m , - m. m + 1 

= IKZ^^-.fc^j.fc) Ils2,„ < s + . 

Using ||Xa|| — \\Xa\\s^ < II^aUsp and applying the Khintchine 
inequality in Theorem 14.31 we obtain for an integer m 



eii^aII"" = E\\AIAa - /Af " < E\\AIAa - Ja"'™ 



S2n 



-Lew J2^3^k A,,, fs^^ < 2(27r) 



3^k 



/(2m)!y £^ 
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Stirling's formula gives 

(2m)! V27r2m(2m/e)2'"e^2 



2™m! 2™\/2^(m/e)'"e^" 



< ^(2/e)™m'", (16) 



where i2m+i — — T5m- application of Holder's inequality 
yields for 9 G [0, 1] and an arbitrary random variable Z. 

jg|^|2m + 29 _ jg|j^j(l-e)2m|^|e(2m + 2)-| 

< (E|Z|^™)'"''(E|2-|^'"+^)*. (17) 
Combining our estimates above gives 

jg||^^||2m+2e ^ (E||XAf'")'"*(E|lXAf 



< 4(2^)2^+29 (2/e)2-+2V2™(i-«)(m + l)2''<'"+i)i 



2m + 20 



2m + 20 S 



m + 6 + 1 



where we used the inequality between the geometric and arithmetic 
mean in the third step. In other words, for p > 2, 

(E||XA|r)'/^< — ,/^(4s)^/V 
An application of Lemma 15.11 with the optimal value k = 1 yields 



Xa\\ > 2nJ-u] < 4se"" 



for all u > 2. Setting the right hand side equal e shows that ||^a|| < 
5 with probability at least 1 — e provided 

n > (27r)^5"^slog^(4s/e). 

This completes the proof of Theorem 13.41 
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